Loss is its own Reward: Self-Supervision for Reinforcement Learning

نویسندگان

  • Evan Shelhamer
  • Parsa Mahmoudieh
  • Max Argus
  • Trevor Darrell
چکیده

Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of selfsupervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquitous and instantaneous supervision for representation learning even in the absence of reward. While current results show that learning from reward alone is feasible, pure reinforcement learning methods are constrained by computational and data efficiency issues that can be remedied by auxiliary losses. Self-supervised pre-training and joint optimization improve the data efficiency and policy returns of end-to-end reinforcement learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-Supervision for Reinforcement Learning

Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquit...

متن کامل

Modelling structural relations of craving based on sensitivity to reinforcement, distress tolerance and self-Compassion with the mediating role of self-efficacy for quitting

Background & Objectives:  Craving is a major barrier to the effective treatment of substance  addiction. This study conducted in order to Modelling structural relations of craving based on sensitivity to reinforcement, distress tolerance and self-compassion with the mediating role of self-efficacy for quitting. Materials and Methods: The method of this study was descriptive-correlational. The...

متن کامل

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players’ hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self O...

متن کامل

Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Abstract: Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of twowheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and it...

متن کامل

Autonomous Learning of Reward Distribution for Each Agent in Multi-Agent Reinforcement Learning

A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some cases in which the agent gets more reward than that it gave to the other ones. In this case, it is better for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.07307  شماره 

صفحات  -

تاریخ انتشار 2016